cuda: reset cuda context after reading memory size by 0cc4m · Pull Request #23935 · ggml-org/llama.cpp

0cc4m · 2026-05-31T06:37:18Z

Overview

Alternative to #23604, to allow reading CUDA memory in the router process in #21231 without allocating permanent memory through an initialized CUDA context. Instead of using NVML, this checks before running cudaMemGetInfo whether the context is already initialized. If not, it releases the context after the call.

I tried ref-counting as well as suggested in #23604 (comment), but that is harder to get right and introduces more edge cases.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES

ORippler · 2026-06-02T09:30:09Z

Alternative to #23604, to allow reading CUDA memory in the router process in #21231 without allocating permanent memory through an initialized CUDA context. Instead of using NVML, this checks before running cudaMemGetInfo whether the context is already initialized. If not, it releases the context after the call.

How often will the router process query the available memory? If it's only once at the beginning, I'd suggest to do a pattern like

ggml_backend_init -> ggml_backend_device_i.get_memory -> ggml_backend_i.free.

Intuitively, I'd have thought a backend has to be initialized before we can ask it about its available memory. Consequentially, we would move the release of the cuda context into ggml_backend_cuda_free

0cc4m · 2026-06-02T10:04:27Z

The behaviour of initialisation just to read memory state seems to be unique to CUDA, so I would prefer to handle it inside of the CUDA backend, not outside.

JohannesGaessler · 2026-06-02T18:24:49Z

@ORippler as of right now fetching memory is part of the ggml backend device API (== CUDA device), not the ggml backend API (== CUDA stream). So the lifetime of the CUDA device context cannot be simply tied to the lifetime of a ggml backend unless we move that function. And I would not be in favor of this since the memory in my opinion belongs to the device.

ORippler · 2026-06-03T19:37:12Z

And I would not be in favor of this since the memory in my opinion belongs to the device.

Still unintuitive to me: what good is a device if I don't have the constructs/context in place to dispatch work to it. But maybe I'm too biased by CUDA on this one 🤷‍♂️

0cc4m · 2026-06-04T11:46:27Z

I forgot that hip and musa were also initially included here, I don't think that is required, so I'll remove it.

Edit: On second thought, that would require wrapping all counter calls in preprocessor checks. Not sure whether that would be better here.

0cc4m · 2026-06-04T11:54:55Z

I excluded hip and musa, sorry about the noise. @JohannesGaessler Let me know if this looks better.

JohannesGaessler

Thank you!

JohannesGaessler · 2026-06-05T12:14:49Z

I forgot: Ruben and me had an internal discussion about the implementation. The changes in this PR are still unsafe in combination with -sm row but that is now obsolete and I'll make a PR to remove it in the CUDA backend and simplify the code for cuBLAS.

0cc4m · 2026-06-06T07:17:11Z

@ggml-org/ggml-cuda Can I get another review?

0cc4m · 2026-06-06T07:54:45Z

I noticed the backend_free function was still moved because it needed access to the device struct, but that was no longer necessary because the struct was moved as well. I changed it back, gonna wait for CI again now.

0cc4m requested a review from a team as a code owner May 31, 2026 06:37

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 31, 2026

JohannesGaessler requested changes Jun 2, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated

0cc4m added 2 commits June 4, 2026 13:44

cuda: reset device in get_memory function if no backend is active

c1c3b9b

also count device and host buffers

61e659a

0cc4m force-pushed the 0cc4m/cuda-get-memory-device-reset branch from a182b35 to 61e659a Compare June 4, 2026 11:44

0cc4m requested a review from IMbackK as a code owner June 4, 2026 11:44

exclude hip and musa from counting and device reset

94b6291

0cc4m removed the request for review from IMbackK June 4, 2026 11:54

JohannesGaessler reviewed Jun 4, 2026

View reviewed changes

Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated

Comment thread ggml/src/ggml-cuda/ggml-cuda.cu Outdated

use device mutex instead of atomic

f2f5f24

JohannesGaessler approved these changes Jun 5, 2026

View reviewed changes

0cc4m mentioned this pull request Jun 5, 2026

cuda: read memory through NVML if available #23604

Closed

JohannesGaessler mentioned this pull request Jun 5, 2026

CUDA: remove -sm row, refactor cuBLAS #24216

Open

am17an approved these changes Jun 6, 2026

View reviewed changes

undo backend_free function move

3167ccb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda: reset cuda context after reading memory size#23935

cuda: reset cuda context after reading memory size#23935
0cc4m wants to merge 5 commits into
masterfrom
0cc4m/cuda-get-memory-device-reset

0cc4m commented May 31, 2026

Uh oh!

ORippler commented Jun 2, 2026

Uh oh!

0cc4m commented Jun 2, 2026

Uh oh!

Uh oh!

JohannesGaessler commented Jun 2, 2026

Uh oh!

ORippler commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 4, 2026 •

edited

Loading

Uh oh!

0cc4m commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler left a comment

Uh oh!

JohannesGaessler commented Jun 5, 2026

Uh oh!

0cc4m commented Jun 6, 2026

Uh oh!

0cc4m commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0cc4m commented May 31, 2026

Overview

Requirements

Uh oh!

ORippler commented Jun 2, 2026

Uh oh!

0cc4m commented Jun 2, 2026

Uh oh!

Uh oh!

JohannesGaessler commented Jun 2, 2026

Uh oh!

ORippler commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Jun 5, 2026

Uh oh!

0cc4m commented Jun 6, 2026

Uh oh!

0cc4m commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0cc4m commented Jun 4, 2026 •

edited

Loading